Native Language Identification Using Spectral and Source-Based Features

نویسندگان

Avni Rajpal

Tanvina B. Patel

Hardik B. Sailor

Maulik C. Madhavi

Hemant A. Patil

Hiroya Fujisaki

چکیده

The task of native language (L1) identification from nonnative language (L2) can be thought of as the task of identifying the common traits that each group of L1 speakers maintains while speaking L2 irrespective of the dialect or region. Under the assumption that speakers are L1 proficient, non-native cues in terms of segmental and prosodic aspects are investigated in our work. In this paper, we propose the use of longer duration cepstral features, namely, Mel frequency cepstral coefficients (MFCC) and auditory filterbank features learnt from the database using Convolutional Restricted Boltzmann Machine (ConvRBM) along with their delta and shifted delta features. MFCC and ConvRBM gave accuracy of 38.2% and 36.8%, respectively, on the development set provided for the ComParE 2016 Nativeness Task using Gaussian Mixture Model (GMM) classifier. To add complementary information about the prosodic and excitation source features, phrase information and its dynamics extracted from the log(F0) contour of the speech was explored. The accuracy obtained using score-level fusion between system features (MFCC and ConvRBM) and phrase features were 39.6% and 38.3%, respectively, indicating that phrase information and MFCC capture complementary information than ConvRBM alone. Furthermore, score-level fusion of MFCC, ConvRBM and phrase improves the accuracy to 40.2%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Comparison of Open Source Learning Management Softwares and Presenting a Native Evaluation Tool

Introduction: Nowadays all educational institutes are trying to use technology in their structure. This effort has been faced with different barriers, including cost, time, and support. Therefore, using open source softwares can partially help us in using technology. In this article, we review main features of several open source learning management softwares, while presenting a tool which incl...

متن کامل

Improving Native Language Identification by Using Spelling Errors

In this paper, we explore spelling errors as a source of information for detecting the native language of a writer, a previously under-explored area. We note that character n-grams from misspelled words are very indicative of the native language of the author. In combination with other lexical features, spelling error features lead to 1.2% improvement in accuracy on classifying texts in the TOE...

متن کامل

Language Identification Using Excitation Source Features

This book discusses the contribution of excitation source information in discriminating language. The authors focus on the excitation source component of speech for enhancement of language identification (LID) performance. Language specific features are extracted using two different modes: (i) Implicit processing of linear prediction (LP) residual and (ii) Explicit parameterization of linear pr...

متن کامل